Goto

Collaborating Authors

 kernel low-rank approximation


Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

Neural Information Processing Systems

Low-rank approximation is a common tool used to accelerate kernel methods: the $n \times n$ kernel matrix $K$ is approximated via a rank-$k$ matrix $\tilde K$ which can be stored in much less space and processed more quickly. In this work we study the limits of computationally efficient low-rank kernel approximation. We show that for a broad class of kernels, including the popular Gaussian and polynomial kernels, computing a relative error $k$-rank approximation to $K$ is at least as difficult as multiplying the input data matrix $A \in R^{n \times d}$ by an arbitrary matrix $C \in R^{d \times k}$. Barring a breakthrough in fast matrix multiplication, when $k$ is not too large, this requires $\Omega(nnz(A)k)$ time where $nnz(A)$ is the number of non-zeros in $A$. This lower bound matches, in many parameter regimes, recent work on subquadratic time algorithms for low-rank approximation of general kernels [MM16,MW17], demonstrating that these algorithms are unlikely to be significantly improved, in particular to $O(nnz(A))$ input sparsity runtimes. At the same time there is hope: we show for the first time that $O(nnz(A))$ time approximation is possible for general radial basis function kernels (e.g., the Gaussian kernel) for the closely related problem of low-rank approximation of the kernelized dataset.


Reviews: Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

Neural Information Processing Systems

The paper presents negative and positive results to the problem of computing low-rank approximations to the kernel matrix of a given dataset. More specifically, for a fixed kernel function and given some input data and a desired rank k, the authors consider the two related problems of (1) computing a rank k matrix which is relatively close to the optimal rank k approximation of the kernel matrix and of (2) computing an orthonormal basis of a k-dimensional space such that when projecting the data from the feature space (defined by the kernel) down to this k-dimensional space it is relatively close to the optimal such projection. In particular, the paper focuses on the question whether it is possible to solve these problems in a time that is independent of the dimensionality of the input data matrix A (say n times d) and instead depends on the number of non-zero entries in A. Regarding problem (1) the paper provides a negative answer for the linear kernel, all polynomial kernels, and the Gaussian kernel by showing that the problem is as hard as computing (exactly) the product of the input matrix with an arbitrary matrix C (of dim d times k). This implies that a solution cannot be achieved without a significant improvement of the state of the art in matrix multiplication. Regarding problem (2) the paper provides a positive result for shift-invariant kernels using an algorithm based on random Fourier features.


Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

Musco, Cameron, Woodruff, David

Neural Information Processing Systems

Low-rank approximation is a common tool used to accelerate kernel methods: the $n \times n$ kernel matrix $K$ is approximated via a rank-$k$ matrix $\tilde K$ which can be stored in much less space and processed more quickly. In this work we study the limits of computationally efficient low-rank kernel approximation. We show that for a broad class of kernels, including the popular Gaussian and polynomial kernels, computing a relative error $k$-rank approximation to $K$ is at least as difficult as multiplying the input data matrix $A \in R {n \times d}$ by an arbitrary matrix $C \in R {d \times k}$. Barring a breakthrough in fast matrix multiplication, when $k$ is not too large, this requires $\Omega(nnz(A)k)$ time where $nnz(A)$ is the number of non-zeros in $A$. This lower bound matches, in many parameter regimes, recent work on subquadratic time algorithms for low-rank approximation of general kernels [MM16,MW17], demonstrating that these algorithms are unlikely to be significantly improved, in particular to $O(nnz(A))$ input sparsity runtimes.